Introduction to Linux

Operating systems are suites of programs, which make the computer work. UNIX is one of the oldest, dating back from the 1960s. It has been under constant development, and is a stable, multi-user, multi-tasking system popular for running servers that are used in bioinformatic analysis. There are different flavors of UNIX, the most popular of which is Linux, which is free and open source.

Part of the Linux operating system

Linux is made up of the kernel, the shell and programs.

Kernel

The kernel is low-level code that makes everything happen. It controls file access, memory allocation, and all the other tasks the computer needs to do.

The Linux kernel should not be confused with the Python (or R) kernels run in the Jupyter notebooks (see menu). These kernels also handle low-level functions, but specifically for the programming languages. They sit on top of the Linux kernel that works in the background.

Shell

The shell interfaces between the user and the kernel. It interprets commands and carries them out. Most frequently you interact with the shell using a command line interface in a terminal. Python notebooks in Jupyter allow you to execute commands on the Linux shell directly. You simply preface a command with an exclamation point:


In [1]:
! echo Hello, World!


Hello, World!

This allows you to run system commands and incorporate them into your into your workflow.

How you get to the shell depends a lot on your particular flavor of UNIX, but Jupyter will take care of all our shell needs for the purpose of this exercise.

Programs

Most of the time you will be running some kind of program (such as Jupyter) to accomplish what you want. Linux typically comes with a rich suite of build-in programs, which allow you to perform many kinds of useful tasks, from file maniputaltion to list sorting. You can also install others, or write your own, depending on your needs.

Files and processes

Everything in UNIX is either a file or a process. A process is a running program. It gets a unique ID, and the kernel tracks its state. A file is a collection of data that lives on the hard drive. Files are organized into directories (aks folders) in a hierarchical manner.


In [2]:
! tree ..


..
├── data
│   └── reads
│       ├── mutant1_OIST-2015-03-28.fq.gz
│       ├── mutant2_OIST-2015-03-28.fq.gz
│       ├── mutant3_OIST-2015-03-28.fq.gz
│       ├── mutant4_OIST-2015-03-28.fq.gz
│       └── reference_OIST-2015-03-28.fq.gz
├── Dockerfile
├── examples
│   ├── deleteme.txt
│   └── Sir Robin.txt
├── index.ipynb
├── LICENSE.txt
├── README.md
├── ref
│   ├── NC_012967.fasta
│   ├── NC_012967.fasta.fai
│   └── NC_012967.gff
└── src
    ├── homework
    │   └── Introduction to Linux homework.ipynb
    ├── Introduction to Linux.ipynb
    ├── Introduction to Python.ipynb
    └── Raw data.ipynb

6 directories, 18 files

The tree command gives us an overview of the underlying file structure, which is hierarchical in nature. In this case we execute it with the path pointing to ../, which refers to one directory above the current one. We see that there are three folders in the project directory: examples, data, ref and src. Each of the directories have files, and data even has a sub-directory. The current notebook is called Introduction to Python.ipynb that resides in the src directory. Jupyter executes commands in the same folder where the notebook is, which is why we needed the '..' reference to point to the folder above for a general overview.

Just in case your curious, here is what the other files are:

  • Dockerfile controls how the virtual machine is built. It contains all the instructions for configuring the kernel and programs we will use, as well as downloading the data
  • index.ipynb is the landing page, which gives the general overview
  • LICENSE.txt explains how all this code can be used
  • README.md is the file you used to launch the Brinder instance.

Interacting with files

One of the principal tasks you will perform in the shell is manipulating files and directories.

Copying, renaming and moving files

Files can be copied using the command cp source destination Let's give it a try. First, we'll examine the current directory using the ls command, which lists its contents


In [3]:
! ls


homework		     Introduction to Python.ipynb
Introduction to Linux.ipynb  Raw data.ipynb

Now we can copy a file here from the examples folder, and look again.


In [4]:
! cp ../examples/Sir\ Robin.txt example.txt
! ls


example.txt  Introduction to Linux.ipynb   Raw data.ipynb
homework     Introduction to Python.ipynb

We moved a file called Sir Robin.txt to the current folder and also renamed it example.txt. If we wanted to keep the same name we could have issued the command cp ../examples/Sir\ Robin.txt .

As you can guess . refers to the current directory, whereas .. referred to the one above. This is an example of relative file paths, i.e., paths that you can specify relative to other paths. There is actually an absolute file path for every file, which is its unique address on the hard drive, but relative paths allow you to address files in surrounding directories. Using relative paths you (a) save typing and (b) your code works even if you move your project directory elsewhere.

Now, if the file is named Sir Robin.txt, why did we have to type Sir\ Robin.txt into the command? The reason is that whitespace is used as the command separator in cp as in most Linux commands. It is also part of the file name. We can tell the shell to treat this space as part of the file name, by using the backslash, which is known as an escape character.

You can move files using the mv command, in the same manner as copying them:


In [5]:
! mv example.txt example.txt.bak
!ls


example.txt.bak  Introduction to Linux.ipynb   Raw data.ipynb
homework	 Introduction to Python.ipynb

Making directories, navigation and wildcards

Directories are fundamental to organizing your work. You can create them using the mkdir command and go into it using cd, and copy some files here


In [6]:
%%bash
cp ../examples/Sir\ Robin.txt example.txt
mkdir myDir
cd myDir
cp ../example.* .
ls


example.txt
example.txt.bak

This code snipped introduces a few new things. First, we see an IPython magic, a keyword that allows special commands. In this case %%bash executes the rest of the cell in the Linux shell, which is more convenient than typing ! before every command, if there are many of them. Second, we see the appearance of a wildcard caracter *. Wildcards allow us to potentially specify multiple files at once. In this case we ask cp to copy everything starting with the word example into the current directory. Another useful wildcard is ?, which matches any character.


In [7]:
! cp example.txt Example.txt
! ls ?xample.txt


example.txt  Example.txt

Notes

  • Linux filenames are case-sensitive.
  • IPython returns you to the directory where the notebook resides at the end of a shell command cell execution. So, if you have executed a command like cd somedir in a cell block, the next block will not be started in somedir.

Filename conventions

You see that the filenames generally have the form *.*, with the base name before the dot and the extension after the dot. The extension specifies the file type. There are standard extensions, such as txt, which specifies text files. In bioinformatics many file types have standard extension types, which we'll deal with later.

Removing files and folders

We can remove files and folders using the rm and rmdir commands, respectively. Only empty directories can be removed, so we need to delete their contents first.


In [8]:
%%bash
rm myDir/*
rmdir myDir
ls


example.txt
Example.txt
example.txt.bak
homework
Introduction to Linux.ipynb
Introduction to Python.ipynb
Raw data.ipynb

Displaying file contents

At this point you may be wondering what's inside some of these files. We can look at the contents of a text file using cat


In [9]:
!cat example.txt


The Ballad of Brave Sir Robin

Bravely bold Sir Robin rode forth from Camelot.
He was not afraid to die, O brave Sir Robin!
He was not at all afraid to be killed in nasty ways,
Brave, brave, brave, brave Sir Robin!
He was not in the least bit scared to be mashed into a pulp,
Or to have his eyes gouged out, and his elbows broken;
To have his kneecaps split, and his body burned away;
And his limbs all hacked and mangled, brave Sir Robin!

His head smashed in and his heart cut out
And his liver removed and his bowels unplugged
And his nostrils raped and his bottom burned off

Often you will be dealing with large files, and you only want to diplay a small part of them, say the beginning or the end


In [10]:
! head -5 example.txt


The Ballad of Brave Sir Robin

Bravely bold Sir Robin rode forth from Camelot.
He was not afraid to die, O brave Sir Robin!
He was not at all afraid to be killed in nasty ways,

In [11]:
! tail -5 example.txt


And his limbs all hacked and mangled, brave Sir Robin!

His head smashed in and his heart cut out
And his liver removed and his bowels unplugged
And his nostrils raped and his bottom burned off

Searching file contents

You can search the contents of files using the grep command, which prints lines matching a given query:


In [12]:
! grep Robin example.txt


The Ballad of Brave Sir Robin
Bravely bold Sir Robin rode forth from Camelot.
He was not afraid to die, O brave Sir Robin!
Brave, brave, brave, brave Sir Robin!
And his limbs all hacked and mangled, brave Sir Robin!

Useful grep switches

The behavior of grep, like that of many programs can be changed by adding command line switches, such as these

  • -v display those lines that do NOT match
  • -n precede each matching line with the line number
  • -c print only the total count of matched lines

! grep -c Robin example.txt

Redirection and pipes

Linux allows you to redirect output from one command to another. The > character redirects output to a file, overwriting it if one already exists


In [13]:
! head -5 example.txt > example2.txt
! cat example2.txt


The Ballad of Brave Sir Robin

Bravely bold Sir Robin rode forth from Camelot.
He was not afraid to die, O brave Sir Robin!
He was not at all afraid to be killed in nasty ways,

You can append to a file using the >> characters


In [14]:
! head -5 example.txt >> example2.txt
! cat example2.txt


The Ballad of Brave Sir Robin

Bravely bold Sir Robin rode forth from Camelot.
He was not afraid to die, O brave Sir Robin!
He was not at all afraid to be killed in nasty ways,
The Ballad of Brave Sir Robin

Bravely bold Sir Robin rode forth from Camelot.
He was not afraid to die, O brave Sir Robin!
He was not at all afraid to be killed in nasty ways,

Many commands can be linked together by the pipe, represented by | processing the output that would usually be printed to screen (otherwise known as standard output).


In [15]:
! grep Robin example.txt | sort


And his limbs all hacked and mangled, brave Sir Robin!
Brave, brave, brave, brave Sir Robin!
Bravely bold Sir Robin rode forth from Camelot.
He was not afraid to die, O brave Sir Robin!
The Ballad of Brave Sir Robin

In this example, we take the output of grep and pass it to the sort program, which then sorts them in alphabetical order.

Homework

Follow the link to the exercise worksbook

Done?

Continue on to Introduction to Python

References